24 research outputs found
Do you pay for Privacy in Online learning?
Online learning, in the mistake bound model, is one of the most fundamental
concepts in learning theory. Differential privacy, instead, is the most widely
used statistical concept of privacy in the machine learning community. It is
thus clear that defining learning problems that are online differentially
privately learnable is of great interest. In this paper, we pose the question
on if the two problems are equivalent from a learning perspective, i.e., is
privacy for free in the online learning framework?Comment: This is an updated version with i) clearer problem statements
especially in proposed Theorem 1 and ii) clearer discussion of existing work
especially Golowich and Livni (2021). Conference on Learning Theory. PMLR,
202
Inverse Reinforcement Learning from a Gradient-based Learner
Inverse Reinforcement Learning addresses the problem of inferring an expert's
reward function from demonstrations. However, in many applications, we not only
have access to the expert's near-optimal behavior, but we also observe part of
her learning process. In this paper, we propose a new algorithm for this
setting, in which the goal is to recover the reward function being optimized by
an agent, given a sequence of policies produced during learning. Our approach
is based on the assumption that the observed agent is updating her policy
parameters along the gradient direction. Then we extend our method to deal with
the more realistic scenario where we only have access to a dataset of learning
trajectories. For both settings, we provide theoretical insights into our
algorithms' performance. Finally, we evaluate the approach in a simulated
GridWorld environment and on the MuJoCo environments, comparing it with the
state-of-the-art baseline
Active Exploration for Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a
reward function from expert demonstrations. Many IRL algorithms require a known
transition model and sometimes even a known expert policy, or they at least
require access to a generative model. However, these assumptions are too strong
for many real-world applications, where the environment can be accessed only
through sequential interaction. We propose a novel IRL algorithm: Active
exploration for Inverse Reinforcement Learning (AceIRL), which actively
explores an unknown environment and expert policy to quickly learn the expert's
reward function and identify a good policy. AceIRL uses previous observations
to construct confidence intervals that capture plausible reward functions and
find exploration policies that focus on the most informative regions of the
environment. AceIRL is the first approach to active IRL with sample-complexity
bounds that does not require a generative model of the environment. AceIRL
matches the sample complexity of active IRL with a generative model in the
worst case. Additionally, we establish a problem-dependent bound that relates
the sample complexity of AceIRL to the suboptimality gap of a given IRL
problem. We empirically evaluate AceIRL in simulations and find that it
significantly outperforms more naive exploration strategies.Comment: Presented at Conference on Neural Information Processing Systems
(NeurIPS), 202
Provably Learning Nash Policies in Constrained Markov Potential Games
Multi-agent reinforcement learning (MARL) addresses sequential
decision-making problems with multiple agents, where each agent optimizes its
own objective. In many real-world instances, the agents may not only want to
optimize their objectives, but also ensure safe behavior. For example, in
traffic routing, each car (agent) aims to reach its destination quickly
(objective) while avoiding collisions (safety). Constrained Markov Games (CMGs)
are a natural formalism for safe MARL problems, though generally intractable.
In this work, we introduce and study Constrained Markov Potential Games
(CMPGs), an important class of CMGs. We first show that a Nash policy for CMPGs
can be found via constrained optimization. One tempting approach is to solve it
by Lagrangian-based primal-dual methods. As we show, in contrast to the
single-agent setting, however, CMPGs do not satisfy strong duality, rendering
such approaches inapplicable and potentially unsafe. To solve the CMPG problem,
we propose our algorithm Coordinate-Ascent for CMPGs (CA-CMPG), which provably
converges to a Nash policy in tabular, finite-horizon CMPGs. Furthermore, we
provide the first sample complexity bounds for learning Nash policies in
unknown CMPGs, and, which under additional assumptions, guarantee safe
exploration.Comment: 30 page
Learning in Non-Cooperative Configurable Markov Decision Processes
The Configurable Markov Decision Process framework includes two entities: a Reinforcement Learning agent and a configurator that can modify some environmental parameters to improve the agent's performance. This presupposes that the two actors have the same reward functions. What if the configurator does not have the same intentions as the agent? This paper introduces the Non-Cooperative Configurable Markov Decision Process, a setting that allows having two (possibly different) reward functions for the configurator and the agent. Then, we consider an online learning problem, where the configurator has to find the best among a finite set of possible configurations. We propose two learning algorithms to minimize the configurator's expected regret, which exploits the problem's structure, depending on the agent's feedback. While a naive application of the UCB algorithm yields a regret that grows indefinitely over time, we show that our approach suffers only bounded regret. Furthermore, we empirically show the performance of our algorithm in simulated domains
Active Exploration for Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert’s reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies
Demo: JoyTag: a battery-less videogame controller exploiting RFID backscattering
This demo presents our experiences in developing a joystick for videogames that uses RFID backscattering for battery-free operation. Specifically, we develop a system to gather data from a wireless and battery-less joystick, named JoyTag, while it interacts with a videogame console. Our system enables consumers to use JoyTag at every moment without caring about charging